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Abstract. Inhomogeneity in networks can be detected by the analysis of the correlation of the total 
degree of nearest neighbors. This is illustrated by two models. The first one is a random multi- 
partitions network that the Aboav Weaire law, which predicts the linear relationship between the 
degree of node and the total degree of nearest neighbor, is being extended. The second one is a 
preferential attachment network with two partitions which shows scale free properties with power 
tail 7 within the range 2 < 7 < 3. By plotting the total degree of neighbor verses the degree of each 
node in the networks, the scattered plot shows separable clustering as evidence for inhomogeneity 
in networks. The effectiveness of this new tool for the detection of inhomogeneity is demonstrated in 
real bipartite networks. By using this method, some interesting group of nodes of semantic and WWW 
networks have been found. 
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1. Introduction 

Many abstract structures and complex systems can be conveniently described by network. 
Examples include World Wide Web, power grid, food webs, word co-occurrence and protein 
interaction network JTJ |2J. Depending on the system, some of them are naturally modeled as 
bipartite networks such as movie-actor network [2], paper coauthorship 0)|D, protein interaction 
0, sexual relationship [6 7], music sharing [8] and soccer championship [9j. One common tool 
to analyze a bipartite network is to project it into either set of nodes that are of interest to the 
specific investigation. In the projection, edges are usually reconnected as a complete subgraph 
iTTOl ITU , possibly with some weighting schemes [12J. However, information is inevitably lost in 
any projection methods and even undesirable such as projecting sexual network into a female 
only or male only uni-partite network. Thus, some analyses have been carried out on the bipartite 
network directly 1131 fl4l . In general, a network with two partitions can have internal links and 
the resemblance with the bipartite network can be measured by bipartivity fl4llT5ll . In real world 
network, different type of nodes may not be known explicitly, so it is interesting to classify nodes 
into different groups |16|. A similar problem is community detection that focuses on classifying 
nodes into different partitions by optimizing modularity fT7llT8ll . 

The local environment of a node can be described by some typical properties such as degree 
and clustering coefficient. However, it is possible that the clustering coefficient is close to zero 
for a nearly bipartite network. In this case, the degree of the neighbors can therefore provide 
significant information. The degree-neighbor degree correlation can be characterized by the 
Pearson correlation coefficient |19| and it has been studied for networks with homogeneous 
neighbor degree by using the Aboav Weaire (AW) law l20l |2~T1 22J. The original works done 
by Aboav and Weaire in two-dimensional cellular networks have shown an empirically linear 
relationship between the averaged total neighbor degree and the node degree. Later, this law 
has been generalized for random, scale free and some real world networks [23 1. The failure of this 
empirical observation thus provides a hint to detect local inhomogeneity of networks and possibly 
a method to group nodes together. 

The aim of this paper is to study the correlation between total degree of nearest neighbors 
and the degree of the node itself for network with more than one partition. Through the extension 
of the AW law, we propose a simple method to classify nodes of nearly bipartite networks and 
modular networks. The rest of this paper is organized as follows: In section|5J we present a model 
of random multi-partition network and derive the generalization of the Aboav Weaire law of this 
network. In section|3j we study a preferential attachment network with two partitions and discuss 
the implication of this model to real world network. In section |4j we examine node degree and 
nearest neighbor degree correlation of real world networks for both bipartite networks and non- 
bipartite networks. The results suggest an interesting finding for the network of WordNet and 
WWW. Finally, a conclusion is given in section [5] 

2. Multi-partitions random network 

We begin our studies with a brief review of the Aboav- Weaire law. The AW law states that for 
a node with degree k and mean neighbor degree M(k), the averaged total neighbor degree has 
a linear relation with the node degree, given by (kM(k)) = Ak + B, where A and B are the 
parameters depending on the network. For a random network with degree distribution V(k), the 
probability Q(k) of selecting one of the nearest neighbors with degree k is proportional to the total 
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degrees of all nodes with degree k, or NV(k)k. Hence, after normalized, we get Q(k) = V(k)k/ (k) 
EH and the mean degree of neighbor is / (P(k)k/ (k))kdk = (k 2 ) / (k). The Poisson degree 

distribution of random network gives (k 2 ) = (k) - (k) and so A = (k) + 1 [23]. In addition, 
the parameter B represents the assortivity of a network and it takes value B = 0. 

Now, we consider a random network consists of n different partitions that the inter- 
connections are specified between each partition. Formally, a random n-partition network with a 
set of nodes V is composite by n disjoint partition of nodes {V\, ...,V n }. For the network with size 
| V | = N, the size of each partition V, can be specified by the the fraction of nodes r, = | V l \ /N such 
that r i = 1 ■ The edge between each partition can be characterized by the probability matrix 
P = (pij) that pij represents the probability that any two nodes in V, and Vj, respectively, are 
connected. Hence, the diagonal entries represent the self-linkage probabilities and off-diagonal 
entries are the cross-linkage probabilities. For this model, the mean degree contribution from 
partition V; to Vj is pjjVjN. By summing up all degree contribution from different partitions, we 
can get the mean degree for the partition Vf 



where (■), = J -Vi(k)dk is the average taken over the degree distribution of partition V,. Here, we 
consider the model with large N limit such that the N 3> PijN 1, so the probability distribution 
of each partition is close to a continuous Poisson distribution V(k) with the mean degree 
Moreover, the mean degree (k) of the whole network is given by the weighted average of the 
mean degree of each partition, (k) = r, (k)^ 

The nearest neighbor degree distribution Qi(k) is similar to the case of simple random 
network: a randomly selected neighbor located in partition Vj gives degree distribution 
Vj(k)k/ (k) j. For a node u in the partition V, with degree k u , on average, there are k u (pijTjN/ 
edges connected to partition Vj. Therefore, the distribution Vj(k)k/ [k)- of each partition has to be 
weighted by the factor which is proportional to PijTj. After normalized, we have 

For this random network, it gives very good approximation because there is no strong degree 
correlation. The result is shown in Fig. Ha with the plot of a simulation result of Qi(k) and the 
predicted result using the theoretical value of Vj(k) and (k)j. 

For this model, the nodes in the same partition have homogeneous local environment. Hence, 
it is expected the linear relation of the total neighbor degree should still be hold for each partition 
separately, with the form {kM{k)) i = Ajk + B,. For this random network, B, = and the mean 
neighbor degree for the partition V\ is 

Therefore, the slope between the total neighbor degree and node degree is A, = (M(k) ) . In Fig. [lj>, 
the prediction using the above equation shows a good fit with the simulation result. Also, from 
the figure, it can be observed that the points for two partitions are separated into two clusters, 
while the degree distribution of both partition are collapsed together as shown in the inset of Fig. 
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Figure 1. (a) The nearest neighbor degree distribution Q(k) of the random network with two 
partitions (b) Scattered plot of total neighbor degree kM(k) vs k. N = 10000, r x = 0.4, p n = 0.002, 
Pu = P21 = 0.01. 



[TJj. Hence, different type of nodes in the network can be revealed by the total neighbor degree. 
Note that the linear result is not valid for the whole network because this inhomogeneous local 
environment can only result in a non-linear curve, which is the superposition of two lines in the 
figure. Other than the nearly bipartite network, the clear separation is also hold for modular 
network with strong internal links because there can still have a large different in the degree 
between the internal nodes and external nodes. 

3. Preferential attachment network with two partitions 

Most networks in real world exhibit scale free behavior that the degree distribution follows 
a power law at the high degree region. This property has been studied extensively and the 
representative model is the BA network introduced by Barabasi and Albert [2] that is constructed 
by the mechanism of growth and preferential attachment. Here, we propose a model using similar 
mechanism together with different type of nodes labeled explicitly. In addition to the scale free 
property, the degree of nearest neighbor of this model exhibits a rich local behavior that the 
original BA model does not have. 

Here, we focus on the study of two partitions network with growth and preferential 
attachment for simplicity. In this model, we can specify the ratio of node in each partition by 
Tj, such that r\ + r<i = 1, and the number of edges added at each time step by a symmetric matrix 
M = ( m ij)- We begin with a small network, such as a complete bipartite network or two nodes 
network with one edge. At each time step, one new node is added to the network, either belong 
to the partition V\ with the probability t\, or belong to the partition Vi with probability r?_. In 
this grow process, the prescribed partition size ratio T\ ~ | V\ \ /N can be maintained. For every 
newly added node located at partition Vj, there are fixed number of edges m« added between the 
new node and each partition V; in the network. A node with higher degree has higher chance 
to be connected, so for a old node v located at partition V;, the probability of being connected 
Pj(k v ) is linearly proportional to its own degree k v . After normalized by the total degree of its own 
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partition: 



Pj (k v ) = kv for k v eVj (4) 



After describing the model, we now look for the evolving of the degree of a given node u in 
the partition V\ . Similar result for nodes in partition V 2 can be obtained by interchange the index 
1 and 2. From the point of view of an old node u, it has, on average, m\\p\ (k u ) edges added for a 
new node in Vj and m.\ 2 p\ (k u ) edges added for a new node in V 2 . Hence, the change of degree for 
u is given by the weighted sum of these two events: 

k v (t + 1) -k u (t) = r x [m n pi(k u )] + r 2 [mi 2 pi(k u )] (5) 

The probability p\ (k v ) is evolving with time and the denominator depends on the the total degree 
of the partition V\ given by: 

£ k a » {2r x m xx +m X2 )t (6) 

where the first term is the average degree contribution for a node added to partition V\ with 
probability r\ and count 1m\\ for each new node. The second term is the degree contribution from 
the cross link since both partitions gain m\ 2 — m 2 \ degree for each new node. We can now adopt 
the continuum approach and write the evolution equation for the degree k u : 



dk u (t) 
dt 



1r\m\\ + m\ 2 



t 



When a new node is added to the network at time f,, it will have m.\ = m\\ + m.\ 2 degree initially. 
Hence, the initial condition of the node u is k u (f;) = m\ and so the evolving degree is: 

MO = with A, = 2 ^\ + m ^ (8 ) 

nmu + r 2 m 12 

The probability distribution of this partition V\ is therefore given by: 

V x {k) ~ X x m\ x k-^ (9) 

where y% = Aj + 1 is the power of the tail of Vi(k). Similar result can also be obtained for the 
partition V 2 . Hence, we obtain two degree distributions for two different partitions with power 71 
and 72 respectively. One can verify that in the limiting case of r 2 = m\ 2 — 0, the result reduced 
to the BA model with ji = 3. Also, if the internal partition linkage m# is strong for i = 1,2, then 
7; « 3 for both partitions which is similar to the BA model because of the weak coupling between 
two partitions. By taking partial derivative on 7/ with respect to different variables, it can be noted 
that the 7; is monotonic function for variables r/ and when r, 7^ 0.5. The implication of this 
result is that the power tail is 2 < 7, < 3 for the region < f; < 0.5 and 7, > 3 for the region 
0.5 < r, < 1. Comparing two degree distributions, the smaller partition has a slow decaying tail 
while the larger partition has a fast decaying tail as shown in the degree distribution in Fig. [2^. 

Suppose we are now considering the degree distribution of the whole network, the slow 
decaying tail of one partition can dominate the high degree part so that only the tail with power 
2 < 7 < 3 can be observed, which is consistence with the measured power for most world 
networks [1]. In this model, 7 < 3 is the result of the existence of two classes of nodes and 
it implies that, by simply measuring degree distribution, nodes with different types cannot be 
distinguished. Hence, it is natural for us to ask whether some real networks have multiple classes 
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Figure 2. (a) Degree distribution Vi(k) for the partitions V\ and V2 of the scale free model with 
N = 10000, ) - i = 0.2, m„ = 0, m 12 = 10 . (b) Scattered plot of total neighbor degree fcM(fc) vs fc. (inset) 
The corresponding network with some internal links m„ = 3. 

of nodes that are yet to be unraveled. On the other hand, even though the degree distribution is 
scale free, this model suggests that the total nearest neighbor degree shows a different pattern for 
each partition. 

From the above discussion, we know that the plotting of the total neighbor degree and the 
node degree is scattered into two branches. The result of a bipartite network without internal links 
is shown in Fig. [2j}. To find the neighbor degree distribution and the slope Ay one way is to use Eq. 
and j3j by assuming the random connection: p,-, = 1m.nl (^ r i) ror i = 1,2 and pij = m^l (Nr/ry) 
for z 7^ 7. For this bipartite network, the resulting neighbor degree distribution is good and the 
slope of total neighbor can be roughly approximated by A\ ~ (/< 2 ) 2 / (k) 2 and Ai w {k 2 ) 1 / (k} 1 
as shown by two straight lines in the figure. Nonetheless, when there are internal edges, the 
approximation of Aj is not very good because the model with internal degree correlation cannot 
be treated as a simple random network. As shown in the inset of Fig. |2j), with existence of internal 
edges ma = 3, the points fluctuate more widely than the simple bipartite network. In this case, 
as expected, the linear relation between degree and total neighbor degree for both partitions are 
less fit and points are deviated more from the lines. Therefore, the nodes for two partitions are 
partially mixed up and less distinguishable from each other at the low degree part. 

The distinctive separation between two sets of nodes in Fig. |2j), especially at high degree 
or high total neighbor degree region, implies that these nodes can be easily classified into two 
groups. This two branches phenomenon does not occur for the BA model |23| in which the local 
environment is homogeneous for all nodes with the same degree. The points thus concentrate on 
a single line predicted by AW law, which show the result similar either branch in Fig. |2j). Hence, 
this phenomenon can be used to classify nodes into different partitions as we are introduced in the 
next section. 
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Figure 3. Scattered plot of total neighbor degree kM(k) vs k, (a) the substrate and the intermediate in 
the metabolic network (E. Coli), (b) movies and actors in actor network, (c) WordNet, (d) California. 
The black lines are the fitting result of two branches. 



4. Degree-Neighbor degree correlation in real world network 

With the two models discussed in the previous section, we know that the total neighbor degree 
can be used to classify nodes. To test whether the branching phenomenon exists in real world 
network, we examine both explicitly bipartite and non-bipartite network. Undirected networks 
are used in the simulation and the results are shown below. 

The first example is metabolic networks |5| which are explicitly bipartite. It is an interaction 
network composed of the substrate and the intermediate complex. As shown in the Fig. [3^, the 
result of metabolic network is similar to the preferential attachment model we introduced and two 
branches are clearly identified. The substrate can have very high degree by its own nature so the 
tail part of the degree distribution of the whole network is dominated by power tail of substrate. 
In this case, the degree distribution of intermediate nodes is shadowed by the substrate nodes. 
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Table 1. Results of two lines fitting for different networks. N\ and N2 are the number of nodes 
classified into the corresponding partitions by this method, c is the ratio of number of correctly 
classified nodes divided by total number nodes in the network. 

Network 
Random network in Fig. la 
Scale free network in Fig.lZp 
Scale free network in inset of Fig. |2j) 
Metabolic E. Coli 
Actor-movie J2J 
WordNet network |26 1 
California subnetwork l27l 



Sl 


S2 


Ni 


N 2 


c 


55.3 


65.4 


4038 


5962 


1.00 


13.3 


746 


1942 


8058 


1.00 


27.6 


363 


2013 


7987 


0.85 


4.24 


106 


766 


1509 


0.92 


18.8 


64.7 


383640 


127823 


0.57 


4.14 


75.0 


34652 


42191 




11.2 


46.4 


2266 


3909 





However, in the scattered plot of the total neighbor degree, these two type of nodes are clearly 
separated into two clusters. Another bipartite network tested is the actor-movie network (2j |25l 
in which edges represent a particular actor playing in a particular movie. For this network, the 
degree distribution of both partitions is close to each other. So, it is expected that the points of the 
total neighbor degree for these two sets mixed at the region of low degree as shown in the Fig. |3j). 
Nevertheless, nodes can still be distinguished clearly other than the low degree region. 

Examples discuss above are explicitly bipartite so, in some sense, they should be easily 
distinguishable. However, it is a challenge to classified nodes into different groups for networks 
without having any a prior knowledge on their origins. Now, similar method can be employed 
to classify nodes by detecting the local inhomogeneity and the branching in the total neighbor 
degree. One of the examples in this category is the semantic network of the WordNet project |26] 
which studies the semantic relationship between different English words. As shown in Fig. |3j:, 
two branches for this network can be observed. Through the inspection of words in the network, 
it can be concluded that the steeper branch contains words that are specialized while the other 
branch corresponds to the generic words. Even though specialized words have low degree, they 
can still have high total neighbor degree because they are typically connected to generic words that 
have high degree. Another one is the California web subgraph networks |27, 28 1. It is constructed 
by linking webpages together depending on the querying results of search engine. It is not an 
explicitly bipartite network, but two different branches is clearly shown in the Fig. |3jl. 

To quantify the observation, we perform a least square fit to find the best fitting lines and 
group nodes together. For a simple homogeneous network, the Aboav-Weaire law predicts that 
the data point of average total neighbor degree verse degree will fit into one single line. Thus, for a 
network with two different partitions, we expect that there should be two clear straight lines. With 
the same reason discussed for the random n-partition network, the y-intercept is usually small and 
we assume it to be zero. Hence, we look for the lines of the form y — S\X and y — S2X, with the 
slopes Si and S2 as the fitting parameters, such that the square of distance between different points 
to the two lines is minimized: 

E(si , S2) .|^((^,fc-£) 

Here, we can get the best fitting slopes s\ and S2 by minimized £(si,S2). This method provides 
a simple classification of nodes into two groups. If a network is homogeneous for the local 
environment, then there may be only one group of nodes and the resulting s\ and S2 should take a 
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value close to each other. The corresponding fitting results are plotted as two black straight lines 
in the Fig. [3] Moreover, the fitting and classification results is shown in Table[T]for networks used 
in this paper. From the table, we can see that the classification is very good for the models we 
studied, even for the preferential attachment network with internal linkages. For the real world 
bipartite networks, we can see that the classification is acceptable for the metabolic network and 
the actor-movie network. For the non-bipartite network, the fitting curves represent those two 
branches very good in Fig. [3] In addition, the large different in the values of s% and S2 signify 
that these networks are better be described by two branches and so we can classify them into two 
groups. 

5. Summary 

In sum, we have studied nearest neighbor degree correlation for the random multi-partition 
network, preferential attachment network and some real world networks. Through the analysis 
of the extended AW law, the exact neighbor degree distribution is computed for random multi- 
partition network. Furthermore, we show that there is a linear relationship between total neighbor 
degree and the node degree for each partition separately, but not linear for the whole network. This 
phenomenon is especially distinct for the preferential attachment network which also model the 
scale free property with 2 < 7 < 3. The clustering of points in the scattered plot of total neighbor 
degree verse degree therefore suggests a way to classify node into different groups. By applying 
this classification scheme to the models studied and real bipartite networks, we show that the 
grouping of node is satisfactory. We also find an interesting subset of nodes in the WordNet and 
California subgraph networks which are not bipartite. 
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